AITopics | lipschitz function

Muon and its variants have shown strong empirical performance in a variety of deep learning tasks. Existing convergence analyses of Muon rely on smoothness assumptions, though arguably the most successful function class for developing deep learning methods (such as AdaGrad, Shampoo, Schedule-Free and more) has been the class of convex and Lipschitz functions. In this paper we question whether the classical convex Lipschitz model is a useful one for understanding Muon. Our answer is no. We show that Muon does not converge on the class of convex and Lipschitz functions, regardless of the choice of learning rate schedule. We also show that error feedback restores convergence of Muon and all the non-Euclidean subgradient methods with momentum. However, this theoretical fix using error feedback degrades the performance of Muon in two representative settings for image classification (CIFAR-10) and language modeling (nanoGPT on FineWeb-Edu 10B). Our conclusion is that convex Lipschitz theory, despite having a prominent role in the design of practical methods for deep learning, is not the most suited one for Muon. This suggests that Muon's success must come from structure absent from this model, most plausibly related to smoothness.

artificial intelligence, machine learning, muon, (16 more...)

arXiv.org Machine Learning

2605.0898

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

18210aa6209b9adfc97b8c17c3741d95-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 09:02:26 GMT

artificial intelligence, complexity, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Contextual Pricing for Lipschitz Buyers

Neural Information Processing SystemsMar-16-2026, 19:56:12 GMT

We investigate the problem of learning a Lipschitz function from binary feedback. In this problem, a learner is trying to learn a Lipschitz function $f:[0,1]^d \rightarrow [0,1]$ over the course of $T$ rounds. On round $t$, an adversary provides the learner with an input $x_t$, the learner submits a guess $y_t$ for $f(x_t)$, and learns whether $y_t > f(x_t)$ or $y_t \leq f(x_t)$. The learner's goal is to minimize their total loss $\sum_t\ell(f(x_t), y_t)$ (for some loss function $\ell$). The problem is motivated by \textit{contextual dynamic pricing}, where a firm must sell a stream of differentiated products to a collection of buyers with non-linear valuations for the items and observes only whether the item was sold or not at the posted price.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.55)

Add feedback

Contextual Pricing for Lipschitz Buyers

Jieming Mao, Renato Leme, Jon Schneider

Neural Information Processing SystemsFeb-12-2026, 16:53:17 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, lipschitz function, pricing problem, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
North America > United States > Pennsylvania (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

c2201e444d2b22a10ca50116a522b9a9-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 18:18:12 GMT

arxiv preprint arxiv, dense model, sparse model, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > Jordan (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
(2 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

5d9e4a04afb9f3608ccc76c1ffa7573e-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 21:59:12 GMT

Sets and scalars are represented by calligraphic and standard fonts,6 respectively. Intuitively, if Φ (w0) is a (µΦ,νΦ)-near-isometry, then one would expect Φ to remain near-10 isometry forallnearby points. We start with the basic definition of Hermite polynomial and its properties. A bound on (2kvk + kδvk) is obtained in (A.41). Let z Rd denote a Gaussian random vector.

artificial intelligence, asfollow, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Initialization-Dependent Sample Complexity of Linear Predictors and Neural Networks

Neural Information Processing SystemsFeb-8-2026, 09:15:39 GMT

Clearly, in order for learning to be possible, we must impose some constraints on the size of the function class. One possibility is to bound the number of parameters (i.e., the dimensions of the matrix W), in which case learnability follows from standard VC-dimension or covering number arguments (see Anthony and Bartlett [1999]).

artificial intelligence, complexity, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Apulia > Bari (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Failure of uniform laws of large numbers for subdifferentials and beyond

Tian, Lai, Royset, Johannes O.

arXiv.org Machine LearningNov-21-2025

We provide counterexamples showing that uniform laws of large numbers do not hold for subdifferentials under natural assumptions. Our results apply to random Lipschitz functions and random convex functions with a finite number of smooth pieces. Consequently, they resolve the questions posed by Shapiro and Xu [J. Math. Anal. Appl., 325(2), 2007] in the negative and highlight the obstacles nonsmoothness poses to uniform results.

artificial intelligence, machine learning, uniform law, (18 more...)

arXiv.org Machine Learning

2511.16568

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Contextual Pricing for Lipschitz Buyers

Neural Information Processing SystemsNov-20-2025, 22:07:09 GMT

We investigate the problem of learning a Lipschitz function from binary feedback. In this problem, a learner is trying to learn a Lipschitz function $f:[0,1]^d \rightarrow [0,1]$ over the course of $T$ rounds. On round $t$, an adversary provides the learner with an input $x_t$, the learner submits a guess $y_t$ for $f(x_t)$, and learns whether $y_t > f(x_t)$ or $y_t \leq f(x_t)$. The learner's goal is to minimize their total loss $\sum_t\ell(f(x_t), y_t)$ (for some loss function $\ell$). The problem is motivated by \textit{contextual dynamic pricing}, where a firm must sell a stream of differentiated products to a collection of buyers with non-linear valuations for the items and observes only whether the item was sold or not at the posted price.

contextual pricing, lipschitz buyer, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.55)

Add feedback